hetoolkit-Vignette
APEM Ltd
hetoolkit-vignette.RmdInstallation
# if(!require(pacman)) install.packages('pacman') pacman::p_load(dplyr,
# insight, lubridate, readr, downloader, readxl, RCurl, writexl, tidyr,
# stringr, tibble, htmlTable, devtools, roxygen2, plotly, ggnewscale, ggplot2,
# fasstr, lme4, sjmisc, mgcv, gridExtra, ggfortify, visreg, formatR, sf,
# ggrepel, reshape, grid, glmmTMB, remotes, merTools, GGally, plyr, imputeTS,
# ggpubr, geojsonio, mapview)
if (!require(pacman)) {
install.packages("pacman")
}
pacman::p_load(insight, RCurl, writexl, htmlTable, devtools, roxygen2, ggnewscale,
visreg, formatR, reshape, glmmTMB, remotes, merTools, GGally, imputeTS, sf, geojsonio,
mapview, plyr, rnrfa)
# Conditionally install hetoolkit from github
p <- as.data.frame(utils::installed.packages())
if ("hetoolkit" %in% p$Package == FALSE) {
remotes::install_github("APEM-LTD/hetoolkit")
}
rm(p)
## Load hetoolkit. Library used for building of pkgdown site. If running this
## code use library(hetoolkit) instead. library(hetoolkit)
load_all()
# remotes::install_github('aquaMetrics/rict')
# for development install.packages('devtools') library(devtools)
# devtools::load_all(export_all = FALSE) install_deps() for public version
# (once live) remotes::install_github('APEM-LTD/hetoolkit') library(hetoolkit)Introduction
The hetoolkit package comprises a collection of 21
functions for assembling, processing, visualising and modelling
hydro-ecological data. These are:
-
import_nrfafor importing flow data from the National River Flow Archive (NRFA); -
import_hdefor importing flow data from the Environment Agency (EA) Hydrology Data Explorer (HDE); -
import_flowfilesfor importing flow data from local files; -
import_flowfor importing flow data from a mix of the above sources; -
impute_flowfor infilling missing records in daily flow time series for one or more sites (gauging stations) using either an interpolation or an equipercentile method. -
import_invfor importing macroinvertebrate sampling data from the EA Ecology and Fish Data Explorer; -
import_envfor importing environmental base data from the EA Ecology and Fish Data Explorer; -
import_rhsfor importing River Habitat Survey (RHS) data from the EA’s Open Data portal; -
predict_indicesfor calculating expected scores for macroinvertebrate indices using the RICT model (FBA 2020); -
calc_flowstatsandcalc_rfrstatsfor calculating summary statistics describing historical flow conditions; -
join_hefor joining the above datasets; -
plot_heatmapfor visualising and summarising gaps in time series data; -
plot_hevandshiny_hevfor producing time series plots of biology and flow data; -
plot_sitepcafor summarising environmental characteristics of biological sampling sites; -
plot_rngflowsfor Visualising the range of flow conditions experienced historically at a site; -
model_cvandmodel_logocvfor performing cross-validation on linear mixed-effects models and hierarchical generalized additive models; -
diag_lmerfor generating a variety of diagnostic plots for a mixed-effects regression (lmer) model; and -
plot_predictionsfor visualising the time series predictions from a hydro-ecological model.
This vignette illustrates a typical workflow using a selection of 20 macroinvertebrate sampling sites from the Environment Agency’s National Drought Monitoring Network (NDMN).
Although the package has been developed with macroinvertebrate data in mind, the functions can be used with any kind of biological sampling data.
Meta-data file
To link together disparate datasets requires a look-up table of site ids. In this example, we load a table with four columns:
- biol_site_id = id of biology (in this case macroinvertebrate) sampling site.
- flow_site_id = id of paired flow gauging station.
- flow_input = vector specifying where to source the flow data for station (either National River Flow Archive “NRFA”, Hydrology Data Explorer “HDE”, or local files “FLOWFILES”).
- rhs_survey_id = id of paired River Habitat Survey (RHS) (survey id, not not site id, in case multiple surveys have been undertaken at a site).
# load master file
data("master_file")
# make all columns character vectors
master_file$biol_site_id <- as.character(master_file$biol_site_id)
master_file$rhs_survey_id <- as.character(master_file$rhs_survey_id)
# filter master file for selected sites of interest
master_data <- master_file %>%
filter(biol_site_id %in% c("34310", "34343", "34352", "55287", "55395", "55417",
"55673", "55824", "55897", "56065", "56226", "54637", "54769", "54801", "80998",
"56491", "54827", "77216", "52828"))
# view data
master_data## # A tibble: 19 × 4
## biol_site_id flow_site_id flow_input rhs_survey_id
## <chr> <chr> <chr> <chr>
## 1 34310 2859TH HDE 39880
## 2 34343 4790TH HDE 39881
## 3 34352 5420TH HDE 39797
## 4 77216 F3105 HDE 39621
## 5 52828 4175 HDE 39804
## 6 55287 029003 HDE 38660
## 7 55395 029001 HDE 38669
## 8 55417 030017 HDE 38670
## 9 55673 032806 HDE 39205
## 10 55824 031031 HDE 39707
## 11 55897 033044 HDE 39798
## 12 56065 033051 HDE 39310
## 13 56226 033022 HDE 39839
## 14 54637 037005 HDE 39560
## 15 54769 035004 HDE 38952
## 16 54801 034003 HDE 39612
## 17 80998 034206 HDE 39891
## 18 56491 U33093 HDE 39758
## 19 54827 035002 HDE 39574
# get site lists, for use with functions
biolsites <- master_data$biol_site_id
flowsites <- master_data$flow_site_id
flowinputs <- master_data$flow_input
rhssurveys <- master_data$rhs_survey_idStandardised Column Names
A number of standardised column names are used throughout the
hetoolkit package, and throughout this vignette and its
associated datasets. These include:
- biol_site_id = macroinvertebrate sampling site ids.
- flow_site_id = flow gauging station ids.
- rhs_survey_id = River Habitat Survey (RHS) ids (survey id, not not site id, in case multiple surveys have been undertaken at a site).
-
flow = flow data, as downloaded using the
import_flowfunction
Prepare biology, ENV and RHS data
Import biology data
The import_inv function imports macroinvertebrate
sampling data from the Environment Agency’s Ecology and Fish Data
Explorer. The data can either be downloaded from https://environment.data.gov.uk/ecology-fish/downloads/INV_OPEN_DATA.zip
or read in from a local .csv or .rds file. The data can be optionally
filtered by site ID and sample date.
Below, we use our list biolsites to filter the data from
EDE.
# Import biology data from EDE
biol_data <- import_inv(source = "parquet", sites = biolsites, start_date = "2010-01-01",
end_date = "2020-12-31")
# view biol_data
biol_data## # A tibble: 512 × 58
## biol_site_id SAMPLE_ID SAMPLE_VERSION REPLICATE_CODE SAMPLE_DATE SAMPLE_TYPE
## <chr> <chr> <int> <chr> <date> <fct>
## 1 56065 605567 1 NA 2010-03-16 SP
## 2 56065 611126 1 NA 2010-07-28 SP
## 3 56065 614704 1 NA 2010-10-12 SP
## 4 56065 621146 1 NA 2011-03-29 SP
## 5 56065 625474 1 NA 2011-06-14 SP
## 6 56065 627625 1 NA 2011-08-06 SP
## 7 56065 630894 1 NA 2011-10-12 SP
## 8 56065 633498 1 NA 2011-12-09 SP
## 9 56065 634677 1 NA 2012-02-08 SP
## 10 56065 635806 1 NA 2012-04-12 SP
## # ℹ 502 more rows
## # ℹ 52 more variables: SAMPLE_TYPE_DESCRIPTION <fct>, SAMPLE_METHOD <fct>,
## # SAMPLE_METHOD_DESCRIPTION <fct>, SAMPLE_REASON <fct>, ANALYSIS_ID <int>,
## # DATE_OF_ANALYSIS <date>, ANALYSIS_TYPE <fct>,
## # ANALYSIS_TYPE_DESCRIPTION <fct>, ANALYSIS_METHOD <fct>,
## # ANALYSIS_METHOD_DESCRIPTION <fct>, BMWP_N_TAXA <int>, BMWP_TOTAL <int>,
## # BMWP_ASPT <dbl>, CCI_N_TAXA <int>, CCI_CS_TOTAL <int>, CCI_ASPT <dbl>, …
Optional: Join additional biology data
If the user has additional biology data in a separate Excel file, it is possible to append this to the EDE download. The additional data must have the same column names as the EDE download file.
# bind 2 biology data sets - one from EDE and one local file
# drop any unwanted variables/columns from the EDE download file
drops_bio <- c("SAMPLE_VERSION", "REPLICATE_CODE", "SAMPLE_TYPE", "SAMPLE_METHOD",
"ANALYSIS_TYPE", "ANALYSIS_METHOD", "IS_THIRD_PARTY_DATA", "WATERBODY_TYPE")
# drop unwanted variables
biol_data2 <- biol_data[, !(names(biol_data) %in% drops_bio)]
# read in additional biology data in csv format
biol_data_excel <- read.csv("data/biol_data_join.csv")
# format columns
biol_data_excel <- biol_data_excel %>%
dplyr::mutate(biol_site_id = as.character(biol_site_id))
# convert to tibble format
biol_data_excel <- as_tibble(biol_data_excel)
# bind datasets
biol_data_final <- rbind(biol_data2, biol_data_excel)Import environmental data
The import_env function allows the user to download
environmental base data from the Environment Agency’s Ecology and Fish
Data Explorer.
The function either:
- downloads environmental data data from https://environment.data.gov.uk/ecology-fish/downloads/INV_OPEN_DATA.zip
- or imports it from a local .csv or .rds file
Data can be optionally filtered by site ID.
When saving, the name of rds file is hard-wired to: INV_OPEN_DATA_SITES_ALL.rds.
If saving prior to filtering, the name of the filtered rds file is hard-wired to: INV_OPEN_DATA_SITE_F.rds.
Below, we use our list biolsites to filter the data from
EDE.
# Import biology data from EDE
env_data <- import_env(sites = biolsites)
# view env_data
env_data## # A tibble: 19 × 34
## AGENCY_AREA REPORTING_AREA CATCHMENT WATERBODY_TYPE WATERBODY_TYPE_DESCR…¹
## <chr> <chr> <chr> <chr> <chr>
## 1 ANGLIAN - NOR… LINCOLNSHIRE … EAST LIN… WBRV RIVER: Natural/semi-n…
## 2 ANGLIAN - NOR… LINCOLNSHIRE … CENTRAL … WBRV RIVER: Natural/semi-n…
## 3 THAMES - NORT… HERTFORDSHIRE… RODING WBRV RIVER: Natural/semi-n…
## 4 ANGLIAN - EAS… EAST ANGLIA -… COLNE (A… WBRV RIVER: Natural/semi-n…
## 5 NORTH EAST - … YORKSHIRE HULL WBRV RIVER: Natural/semi-n…
## 6 MIDLANDS - CE… WEST MIDLANDS MEASE (U… WBRV RIVER: Natural/semi-n…
## 7 ANGLIAN - CEN… EAST ANGLIA -… IVEL WBRV RIVER: Natural/semi-n…
## 8 THAMES - NORT… HERTFORDSHIRE… COLNE (T… WBRV RIVER: Natural/semi-n…
## 9 ANGLIAN - CEN… EAST ANGLIA -… CAM (2) WBRV RIVER: Natural/semi-n…
## 10 ANGLIAN - CEN… EAST ANGLIA -… LT OUSE,… WBRV RIVER: Natural/semi-n…
## 11 ANGLIAN - NOR… LINCOLNSHIRE … WELLAND WBRV RIVER: Natural/semi-n…
## 12 ANGLIAN - NOR… LINCOLNSHIRE … EAST LIN… WBRV RIVER: Natural/semi-n…
## 13 ANGLIAN - EAS… EAST ANGLIA -… ALDE AND… WBRV RIVER: Natural/semi-n…
## 14 ANGLIAN - CEN… EAST ANGLIA -… UPPER OU… WBRV RIVER: Natural/semi-n…
## 15 ANGLIAN - EAS… EAST ANGLIA -… BURE WBRV RIVER: Natural/semi-n…
## 16 ANGLIAN - EAS… EAST ANGLIA -… DEBEN WBRV RIVER: Natural/semi-n…
## 17 THAMES - NORT… HERTFORDSHIRE… MIMRAM WBRV RIVER: Natural/semi-n…
## 18 ANGLIAN - NOR… LINCOLNSHIRE … UPPER NE… WBRV RIVER: Natural/semi-n…
## 19 ANGLIAN - EAS… EAST ANGLIA -… WAVENEY … WBRV RIVER: Natural/semi-n…
## # ℹ abbreviated name: ¹WATERBODY_TYPE_DESCRIPTION
## # ℹ 29 more variables: WATER_BODY <chr>, biol_site_id <chr>,
## # SITE_VERSION <int>, NGR_PREFIX <chr>, EASTING <chr>, NORTHING <chr>,
## # NGR_10_FIG <chr>, FULL_EASTING <int>, FULL_NORTHING <int>,
## # WFD_WATERBODY_ID <chr>, ALTITUDE <dbl>, SLOPE <dbl>,
## # DIST_FROM_SOURCE <dbl>, DISCHARGE <dbl>, WIDTH <dbl>, DEPTH <dbl>,
## # BOULDERS_COBBLES <dbl>, PEBBLES_GRAVEL <dbl>, SAND <dbl>, …
Optional: Map biology sites
Get data
First we download data for our basemap. The England map is of EA public facing area boundaries.
We use the environmental base data that we have downloaded from the
Ecology and Fish Data Explorer using import_env, this gives
us their NGRs. We translate the NGRs to full latitude / longitude
(WGS84) and match this back to the env_data so we have information to
include in the plot.
# Get EA public facing area boundaries
url_request <- "https://environment.data.gov.uk/arcgis/rest/services/EA/AdminBoundEAandNEpublicFaceAreas/FeatureServer/0/query?where=seaward='No'&outFields=*&f=geojson"
ea.areas <- st_read(url_request)## Reading layer `OGRGeoJSON' from data source
## `https://environment.data.gov.uk/arcgis/rest/services/EA/AdminBoundEAandNEpublicFaceAreas/FeatureServer/0/query?where=seaward='No'&outFields=*&f=geojson'
## using driver `GeoJSON'
## Simple feature collection with 14 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -6.419008 ymin: 49.86463 xmax: 1.768937 ymax: 55.81166
## Geodetic CRS: WGS 84
## Convert national grid ref (NGR) to full lat / long from env_data (from import_env function)
## WGS84 is lat/long.
temp.eastnorths <- osg_parse(env_data$NGR_10_FIG, coord_system = "WGS84") %>% as_tibble()
## match to back to env data to give details on map
env_data_map <- cbind(env_data, temp.eastnorths) %>%
dplyr::select(AGENCY_AREA, WATER_BODY, CATCHMENT, WATERBODY_TYPE, biol_site_id, lat, lon)Create the map
Finally we use mapview to plot the EA areas and points indicating the sample sites. The points and polygons are labelled with the biology sample site ID and the EA area code respectively. More data for each site is available by clicking on the point.
## Create map
mapview(ea.areas, alpha.regions = 0.2, label=ea.areas$code) +
mapview(env_data_map, xcol = "lon", ycol = "lat", label=env_data_map$biol_site_id, grid = FALSE)